生成对抗网络(GAN)是能够合成数据样本的强大模型,与真实数据的分布非常相似,但是由于所谓的模式崩溃现象在gans中观察到了这些生成样品的多样性受到限制。尤其容易崩溃的是有条件的gan,它们倾向于忽略输入噪声矢量并专注于条件信息。提议减轻这种限制的最新方法增加了生成的样品的多样性,但是当需要样品相似性时,它们会降低模型的性能。为了解决这一缺点,我们提出了一种新颖的方法,可以选择性地增加GAN生成样品的多样性。通过在训练损失功能中添加简单但有效的正则化,我们鼓励发电机发现与不同输出相关的输入的新数据模式,同时为其余的输出生成一致的样本。更确切地说,我们最大化生成的图像与输入潜在向量之间的距离之比,根据给定条件输入的样品的多样性缩放效果。我们在合成基准测试中显示了我们方法的优势,以及在CERN LHC的Alice实验零度量热计中模拟数据的现实情况。
translated by 谷歌翻译
In this paper, we present a novel visual SLAM and long-term localization benchmark for autonomous driving in challenging conditions based on the large-scale 4Seasons dataset. The proposed benchmark provides drastic appearance variations caused by seasonal changes and diverse weather and illumination conditions. While significant progress has been made in advancing visual SLAM on small-scale datasets with similar conditions, there is still a lack of unified benchmarks representative of real-world scenarios for autonomous driving. We introduce a new unified benchmark for jointly evaluating visual odometry, global place recognition, and map-based visual localization performance which is crucial to successfully enable autonomous driving in any condition. The data has been collected for more than one year, resulting in more than 300 km of recordings in nine different environments ranging from a multi-level parking garage to urban (including tunnels) to countryside and highway. We provide globally consistent reference poses with up to centimeter-level accuracy obtained from the fusion of direct stereo-inertial odometry with RTK GNSS. We evaluate the performance of several state-of-the-art visual odometry and visual localization baseline approaches on the benchmark and analyze their properties. The experimental results provide new insights into current approaches and show promising potential for future research. Our benchmark and evaluation protocols will be available at https://www.4seasons-dataset.com/.
translated by 谷歌翻译
Inspired by foundational studies in classical and quantum physics, and by information retrieval studies in quantum information theory, we have recently proved that the notions of 'energy' and 'entropy' can be consistently introduced in human language and, more generally, in human culture. More explicitly, if energy is attributed to words according to their frequency of appearance in a text, then the ensuing energy levels are distributed non-classically, namely, they obey Bose-Einstein, rather than Maxwell-Boltzmann, statistics, as a consequence of the genuinely 'quantum indistinguishability' of the words that appear in the text. Secondly, the 'quantum entanglement' due to the way meaning is carried by a text reduces the (von Neumann) entropy of the words that appear in the text, a behaviour which cannot be explained within classical (thermodynamic or information) entropy. We claim here that this 'quantum-type behaviour is valid in general in human cognition', namely, any text is conceptually more concrete than the words composing it, which entails that the entropy of the overall text decreases. This result can be prolonged to human culture and its collaborative entities having lower entropy than their constituent elements. We use these findings to propose the development of a new 'non-classical thermodynamic theory for human cognition and human culture', which bridges concepts and quantum entities and agrees with some recent findings on the conceptual, not physical, nature of quantum entities.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Multimodal image-text models have shown remarkable performance in the past few years. However, evaluating their robustness against distribution shifts is crucial before adopting them in real-world applications. In this paper, we investigate the robustness of 9 popular open-sourced image-text models under common perturbations on five tasks (image-text retrieval, visual reasoning, visual entailment, image captioning, and text-to-image generation). In particular, we propose several new multimodal robustness benchmarks by applying 17 image perturbation and 16 text perturbation techniques on top of existing datasets. We observe that multimodal models are not robust to image and text perturbations, especially to image perturbations. Among the tested perturbation methods, character-level perturbations constitute the most severe distribution shift for text, and zoom blur is the most severe shift for image data. We also introduce two new robustness metrics (MMI and MOR) for proper evaluations of multimodal models. We hope our extensive study sheds light on new directions for the development of robust multimodal models.
translated by 谷歌翻译
对象检测网络已经达到了令人印象深刻的性能水平,但是在特定应用程序中缺乏合适的数据通常会限制在实践中。通常,使用其他数据源来支持培训任务。但是,在这些中,不同数据源之间的域间隙在深度学习中构成了挑战。基于GAN的图像到图像样式转移通常用于缩小域间隙,但不稳定并与对象检测任务脱钩。我们提出了Awada,这是一个注意力加权的对抗域适应框架,用于在样式变换和检测任务之间创建反馈循环。通过从对象探测器建议中构造前景对象注意图,我们将转换集中在前景对象区域并稳定样式转移训练。在广泛的实验和消融研究中,我们表明AWADA在常用的基准中达到了最新的无监督域适应对象检测性能,用于诸如合成,不利的天气和跨摄像机适应性。
translated by 谷歌翻译
像许多团队运动一样,篮球涉及两组球员,他们从事合作和对抗性活动以赢得比赛。球员和团队正在执行各种复杂的策略,以比对手获得优势。定义,识别和分析不同类型的活动是体育分析中的一项重要任务,因为它可以导致球员和教练人员更好地策略和决策。本文的目的是自动识别篮球小组的活动,从跟踪代表玩家和球的位置的数据。我们在团队运动中提出了一种新颖的深度学习方法,以称为NETS。为了有效地对团队运动中的玩家关系进行建模,我们将基于变压器的体系结构与LSTM嵌入结合在一起,以及一个团队合并层以识别小组活动。培训这样的神经网络通常需要大量注释数据,这会产生高标签成本。为了解决手动标签的稀缺性,我们在自我监督的轨迹预测任务上生成弱标签并预处理神经网络。我们使用了从632个NBA游戏中的大型跟踪数据集来评估我们的方法。结果表明,NET能够以高准确性学习小组活动,并且网络中的自我监督训练对GAR的准确性产生了积极影响。
translated by 谷歌翻译
由于分布式概括是一个普遍不足的问题,因此在不同的研究计划中研究了各种代理目标(例如,校准,对抗性鲁棒性,算法腐败,跨轮班的不变性),导致不同的研究计划,从而提出不同的建议。在共享相同的抱负目标的同时,这些方法从未在相同的实验条件下对真实数据进行测试。在本文中,我们对以前的工作进行了统一的看法,突出了我们经验解决的消息差异,并提供有关如何衡量模型鲁棒性以及如何改进它的建议。为此,我们收集了172个公开可用的数据集对,用于培训和分布外评估准确性,校准错误,对抗性攻击,环境不变性和合成腐败。我们从九个不同的架构中的九个不同的架构中微调了31k网络。我们的发现证实,分布的精度往往会共同增加,但表明它们的关系在很大程度上取决于数据集依赖性,并且通常比以前较小的规模研究所提出的更加细微和更复杂。
translated by 谷歌翻译
与CNN的分类,分割或对象检测相比,生成网络的目标和方法根本不同。最初,它们不是作为图像分析工具,而是生成自然看起来的图像。已经提出了对抗性训练范式来稳定生成方法,并已被证明是非常成功的 - 尽管绝不是第一次尝试。本章对生成对抗网络(GAN)的动机进行了基本介绍,并通​​过抽象基本任务和工作机制并得出了早期实用方法的困难来追溯其成功的道路。将显示进行更稳定的训练方法,也将显示出不良收敛及其原因的典型迹象。尽管本章侧重于用于图像生成和图像分析的gan,但对抗性训练范式本身并非特定于图像,并且在图像分析中也概括了任务。在将GAN与最近进入场景的进一步生成建模方法进行对比之前,将闻名图像语义分割和异常检测的架构示例。这将允许对限制的上下文化观点,但也可以对gans有好处。
translated by 谷歌翻译
确定大脑是否正常发展是儿科神经加理学和神经内科的关键组成部分。婴儿的脑磁共振成像(MRI)展示了超越髓鞘的特定发展模式。虽然放射科医师使用髓鞘模式,脑形态和尺寸特征来确定年龄充足的脑成熟度,但这需要多年的儿科神经皮层经验。没有标准化标准,在三岁之前的MRI中大脑结构成熟度的视觉估计仍然是观察者间和观察者内的差异。大脑发育年龄的更客观估计可以帮助医生们早先识别许多神经发育病症和疾病。然而,这种数据自然是难以获得的,并且观察者地面真理由于评估的主观性而不是黄金标准。在这种光明中,我们探讨了解决这项任务的一般可行性,以及不同方法的效用,包括在T1加权,T2加权的融合中培训的两维卷积神经网络(CNN)和三维卷积神经网络(CNN)质子密度(Pd)来自84个个体受试者的加权序列分为来自出生于3岁的4岁群体。以最佳性能的方法,在中央轴向厚板上使用2D CNN实现0.90 [95%CI:0.86-0.94]的精度。我们讨论了与3D网络的比较,并展示了如何对仅使用一个序列(T1W)的性能。总之,尽管3D CNN方法的理论优势,但在有限数据的情况下,这种方法差不多达到更简单的架构。代码可以在https://github.com/shabanian2018/age_mri-classification中找到
translated by 谷歌翻译